Overview

Dataset statistics

Number of variables19
Number of observations603416
Missing cells344234
Missing cells (%)3.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory92.1 MiB
Average record size in memory160.0 B

Variable types

Categorical8
Numeric11

Warnings

DBN has a high cardinality: 1631 distinct values High cardinality
School Name has a high cardinality: 1627 distinct values High cardinality
% Attendance is highly correlated with % Attendance_5_yr_avgHigh correlation
% Chronically Absent is highly correlated with % Chronically Absent_5_yr_avgHigh correlation
Next Year % Chronically Absent is highly correlated with % Chronically Absent_5_yr_avgHigh correlation
% Attendance_2_yr_avg is highly correlated with % Attendance_5_yr_avgHigh correlation
% Chronically Absent_2_yr_avg is highly correlated with % Chronically Absent_5_yr_avgHigh correlation
% Attendance_5_yr_avg is highly correlated with % Attendance and 1 other fieldsHigh correlation
% Chronically Absent_5_yr_avg is highly correlated with % Chronically Absent and 2 other fieldsHigh correlation
Borough_Name is highly correlated with Borough_CodeHigh correlation
Borough_Code is highly correlated with Borough_NameHigh correlation
Next Year % Chronically Absent has 149460 (24.8%) missing values Missing
Chronically_Absent_Next_Year has 149460 (24.8%) missing values Missing
% Attendance_2_yr_avg has 20151 (3.3%) missing values Missing
% Chronically Absent_2_yr_avg has 20151 (3.3%) missing values Missing
% Chronically Absent has 18887 (3.1%) zeros Zeros
Next Year % Chronically Absent has 11965 (2.0%) zeros Zeros
% Chronically Absent_2_yr_avg has 7352 (1.2%) zeros Zeros

Reproduction

Analysis started2021-01-11 23:57:47.464896
Analysis finished2021-01-12 00:03:38.527703
Duration5 minutes and 51.06 seconds
Software versionpandas-profiling v2.10.0
Download configurationconfig.yaml

Variables

DBN
Categorical

HIGH CARDINALITY

Distinct1631
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size9.2 MiB
31R080
 
945
01M539
 
885
75Q993
 
797
21K095
 
763
30Q122
 
760
Other values (1626)
599266 

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters3620496
Distinct characters15
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row01M015
2nd row01M015
3rd row01M015
4th row01M015
5th row01M015
ValueCountFrequency (%)
31R080945
 
0.2%
01M539885
 
0.1%
75Q993797
 
0.1%
21K095763
 
0.1%
30Q122760
 
0.1%
25Q219748
 
0.1%
22K206747
 
0.1%
20K104742
 
0.1%
21K225740
 
0.1%
21K226736
 
0.1%
Other values (1621)595553
98.7%
2021-01-11T19:03:39.181969image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
31r080945
 
0.2%
01m539885
 
0.1%
75q993797
 
0.1%
21k095763
 
0.1%
30q122760
 
0.1%
25q219748
 
0.1%
22k206747
 
0.1%
20k104742
 
0.1%
21k225740
 
0.1%
21k226736
 
0.1%
Other values (1621)595553
98.7%

Most occurring characters

ValueCountFrequency (%)
0531023
14.7%
1519950
14.4%
2494732
13.7%
3283136
7.8%
5245159
6.8%
4222205
 
6.1%
7194560
 
5.4%
9179402
 
5.0%
6179191
 
4.9%
K178769
 
4.9%
Other values (5)592369
16.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3017080
83.3%
Uppercase Letter603416
 
16.7%

Most frequent character per category

ValueCountFrequency (%)
0531023
17.6%
1519950
17.2%
2494732
16.4%
3283136
9.4%
5245159
8.1%
4222205
7.4%
7194560
 
6.4%
9179402
 
5.9%
6179191
 
5.9%
8167722
 
5.6%
ValueCountFrequency (%)
K178769
29.6%
Q148530
24.6%
X133381
22.1%
M110824
18.4%
R31912
 
5.3%

Most occurring scripts

ValueCountFrequency (%)
Common3017080
83.3%
Latin603416
 
16.7%

Most frequent character per script

ValueCountFrequency (%)
0531023
17.6%
1519950
17.2%
2494732
16.4%
3283136
9.4%
5245159
8.1%
4222205
7.4%
7194560
 
6.4%
9179402
 
5.9%
6179191
 
5.9%
8167722
 
5.6%
ValueCountFrequency (%)
K178769
29.6%
Q148530
24.6%
X133381
22.1%
M110824
18.4%
R31912
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII3620496
100.0%

Most frequent character per block

ValueCountFrequency (%)
0531023
14.7%
1519950
14.4%
2494732
13.7%
3283136
7.8%
5245159
6.8%
4222205
 
6.1%
7194560
 
5.4%
9179402
 
5.0%
6179191
 
4.9%
K178769
 
4.9%
Other values (5)592369
16.4%

School Name
Categorical

HIGH CARDINALITY

Distinct1627
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size9.2 MiB
P.S. 212
 
1000
P.S. 253
 
950
The Michael J. Petrides School
 
945
New Explorations into Science, Technology and Math
 
885
P.S. Q993
 
797
Other values (1622)
598839 

Length

Max length50
Median length26
Mean length27.2181563
Min length5

Characters and Unicode

Total characters16423871
Distinct characters74
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowP.S. 015 Roberto Clemente
2nd rowP.S. 015 Roberto Clemente
3rd rowP.S. 015 Roberto Clemente
4th rowP.S. 015 Roberto Clemente
5th rowP.S. 015 Roberto Clemente
ValueCountFrequency (%)
P.S. 2121000
 
0.2%
P.S. 253950
 
0.2%
The Michael J. Petrides School945
 
0.2%
New Explorations into Science, Technology and Math885
 
0.1%
P.S. Q993797
 
0.1%
P.S. 095 The Gravesend763
 
0.1%
P.S. 122 Mamie Fay760
 
0.1%
P.S. 219 Paul Klapper748
 
0.1%
P.S. 206 Joseph F Lamb747
 
0.1%
P.S./I.S. 104 The Fort Hamilton School742
 
0.1%
Other values (1617)595079
98.6%
2021-01-11T19:03:39.910990image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
p.s315877
 
11.5%
school212281
 
7.7%
the85246
 
3.1%
high72197
 
2.6%
for58116
 
2.1%
academy56152
 
2.0%
and40069
 
1.5%
of37497
 
1.4%
28244
 
1.0%
bronx19556
 
0.7%
Other values (2047)1826951
66.4%

Most occurring characters

ValueCountFrequency (%)
2149195
 
13.1%
o1094775
 
6.7%
e1072554
 
6.5%
.890224
 
5.4%
a803404
 
4.9%
S729290
 
4.4%
r713156
 
4.3%
l707878
 
4.3%
n705911
 
4.3%
i611049
 
3.7%
Other values (64)6946435
42.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter9482943
57.7%
Uppercase Letter2697399
 
16.4%
Space Separator2149195
 
13.1%
Decimal Number1110749
 
6.8%
Other Punctuation957395
 
5.8%
Dash Punctuation21341
 
0.1%
Open Punctuation3062
 
< 0.1%
Close Punctuation1787
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
S729290
27.0%
P409824
15.2%
H166579
 
6.2%
A163245
 
6.1%
C136037
 
5.0%
M130394
 
4.8%
T130253
 
4.8%
B105152
 
3.9%
E97566
 
3.6%
L80184
 
3.0%
Other values (16)548875
20.3%
ValueCountFrequency (%)
o1094775
11.5%
e1072554
11.3%
a803404
 
8.5%
r713156
 
7.5%
l707878
 
7.5%
n705911
 
7.4%
i611049
 
6.4%
h597790
 
6.3%
c503572
 
5.3%
t455917
 
4.8%
Other values (16)2216937
23.4%
ValueCountFrequency (%)
1211253
19.0%
0210355
18.9%
2145962
13.1%
3100034
9.0%
577196
 
6.9%
976649
 
6.9%
475575
 
6.8%
674733
 
6.7%
872821
 
6.6%
766171
 
6.0%
ValueCountFrequency (%)
.890224
93.0%
/24246
 
2.5%
,20117
 
2.1%
&9555
 
1.0%
'6826
 
0.7%
:6077
 
0.6%
\276
 
< 0.1%
@74
 
< 0.1%
ValueCountFrequency (%)
2149195
100.0%
ValueCountFrequency (%)
-21341
100.0%
ValueCountFrequency (%)
(3062
100.0%
ValueCountFrequency (%)
)1787
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin12180342
74.2%
Common4243529
 
25.8%

Most frequent character per script

ValueCountFrequency (%)
o1094775
 
9.0%
e1072554
 
8.8%
a803404
 
6.6%
S729290
 
6.0%
r713156
 
5.9%
l707878
 
5.8%
n705911
 
5.8%
i611049
 
5.0%
h597790
 
4.9%
c503572
 
4.1%
Other values (42)4640963
38.1%
ValueCountFrequency (%)
2149195
50.6%
.890224
21.0%
1211253
 
5.0%
0210355
 
5.0%
2145962
 
3.4%
3100034
 
2.4%
577196
 
1.8%
976649
 
1.8%
475575
 
1.8%
674733
 
1.8%
Other values (12)232353
 
5.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII16423871
100.0%

Most frequent character per block

ValueCountFrequency (%)
2149195
 
13.1%
o1094775
 
6.7%
e1072554
 
6.5%
.890224
 
5.4%
a803404
 
4.9%
S729290
 
4.4%
r713156
 
4.3%
l707878
 
4.3%
n705911
 
4.3%
i611049
 
3.7%
Other values (64)6946435
42.3%

Grade
Categorical

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.2 MiB
All Grades
117285 
1
47545 
2
47314 
0K
46364 
3
46160 
Other values (10)
298748 

Length

Max length18
Median length1
Mean length3.494933843
Min length1

Characters and Unicode

Total characters2108899
Distinct characters28
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAll Grades
2nd rowPK in K-12 Schools
3rd row0K
4th row1
5th row2
ValueCountFrequency (%)
All Grades117285
19.4%
147545
 
7.9%
247314
 
7.8%
0K46364
 
7.7%
346160
 
7.6%
445008
 
7.5%
543816
 
7.3%
628560
 
4.7%
828508
 
4.7%
928460
 
4.7%
Other values (5)124396
20.6%
2021-01-11T19:03:40.463372image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
all117285
15.1%
grades117285
15.1%
147545
 
6.1%
247314
 
6.1%
0k46364
 
6.0%
346160
 
5.9%
445008
 
5.8%
543816
 
5.6%
628560
 
3.7%
828508
 
3.7%
Other values (9)210465
27.0%

Most occurring characters

ValueCountFrequency (%)
l253773
 
12.0%
174894
 
8.3%
1168664
 
8.0%
s136488
 
6.5%
A117285
 
5.6%
G117285
 
5.6%
r117285
 
5.6%
a117285
 
5.6%
d117285
 
5.6%
e117285
 
5.6%
Other values (18)671370
31.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter974619
46.2%
Decimal Number582437
27.6%
Uppercase Letter357746
 
17.0%
Space Separator174894
 
8.3%
Dash Punctuation19203
 
0.9%

Most frequent character per category

ValueCountFrequency (%)
l253773
26.0%
s136488
14.0%
r117285
12.0%
a117285
12.0%
d117285
12.0%
e117285
12.0%
o38406
 
3.9%
i19203
 
2.0%
n19203
 
2.0%
c19203
 
2.0%
ValueCountFrequency (%)
1168664
29.0%
291391
15.7%
073780
12.7%
346160
 
7.9%
445008
 
7.7%
543816
 
7.5%
628560
 
4.9%
828508
 
4.9%
928460
 
4.9%
728090
 
4.8%
ValueCountFrequency (%)
A117285
32.8%
G117285
32.8%
K84770
23.7%
P19203
 
5.4%
S19203
 
5.4%
ValueCountFrequency (%)
174894
100.0%
ValueCountFrequency (%)
-19203
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1332365
63.2%
Common776534
36.8%

Most frequent character per script

ValueCountFrequency (%)
l253773
19.0%
s136488
10.2%
A117285
8.8%
G117285
8.8%
r117285
8.8%
a117285
8.8%
d117285
8.8%
e117285
8.8%
K84770
 
6.4%
o38406
 
2.9%
Other values (6)115218
8.6%
ValueCountFrequency (%)
174894
22.5%
1168664
21.7%
291391
11.8%
073780
9.5%
346160
 
5.9%
445008
 
5.8%
543816
 
5.6%
628560
 
3.7%
828508
 
3.7%
928460
 
3.7%
Other values (2)47293
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2108899
100.0%

Most frequent character per block

ValueCountFrequency (%)
l253773
 
12.0%
174894
 
8.3%
1168664
 
8.0%
s136488
 
6.5%
A117285
 
5.6%
G117285
 
5.6%
r117285
 
5.6%
a117285
 
5.6%
d117285
 
5.6%
e117285
 
5.6%
Other values (18)671370
31.8%

Year
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.2 MiB
2018-19
102645 
2017-18
102496 
2016-17
102411 
2015-16
100911 
2014-15
98739 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters4223912
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2013-14
2nd row2013-14
3rd row2013-14
4th row2013-14
5th row2013-14
ValueCountFrequency (%)
2018-19102645
17.0%
2017-18102496
17.0%
2016-17102411
17.0%
2015-16100911
16.7%
2014-1598739
16.4%
2013-1496214
15.9%
2021-01-11T19:03:40.926305image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-01-11T19:03:41.116175image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
2018-19102645
17.0%
2017-18102496
17.0%
2016-17102411
17.0%
2015-16100911
16.7%
2014-1598739
16.4%
2013-1496214
15.9%

Most occurring characters

ValueCountFrequency (%)
11206832
28.6%
2603416
14.3%
0603416
14.3%
-603416
14.3%
8205141
 
4.9%
7204907
 
4.9%
6203322
 
4.8%
5199650
 
4.7%
4194953
 
4.6%
9102645
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3620496
85.7%
Dash Punctuation603416
 
14.3%

Most frequent character per category

ValueCountFrequency (%)
11206832
33.3%
2603416
16.7%
0603416
16.7%
8205141
 
5.7%
7204907
 
5.7%
6203322
 
5.6%
5199650
 
5.5%
4194953
 
5.4%
9102645
 
2.8%
396214
 
2.7%
ValueCountFrequency (%)
-603416
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4223912
100.0%

Most frequent character per script

ValueCountFrequency (%)
11206832
28.6%
2603416
14.3%
0603416
14.3%
-603416
14.3%
8205141
 
4.9%
7204907
 
4.9%
6203322
 
4.8%
5199650
 
4.7%
4194953
 
4.6%
9102645
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII4223912
100.0%

Most frequent character per block

ValueCountFrequency (%)
11206832
28.6%
2603416
14.3%
0603416
14.3%
-603416
14.3%
8205141
 
4.9%
7204907
 
4.9%
6203322
 
4.8%
5199650
 
4.7%
4194953
 
4.6%
9102645
 
2.4%
Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.2 MiB
All Students
61409 
Female
58917 
Male
58760 
Hispanic
52350 
SWD
52001 
Other values (9)
319979 

Length

Max length12
Median length6
Mean length6.603374786
Min length3

Characters and Unicode

Total characters3984582
Distinct characters32
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAll Students
2nd rowAll Students
3rd rowAll Students
4th rowAll Students
5th rowAll Students
ValueCountFrequency (%)
All Students61409
10.2%
Female58917
9.8%
Male58760
9.7%
Hispanic52350
8.7%
SWD52001
8.6%
Not SWD50016
8.3%
Poverty47929
7.9%
Not Poverty46919
7.8%
Not ELL40463
6.7%
Black40209
6.7%
Other values (4)94443
15.7%
2021-01-11T19:03:41.663641image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
not137398
17.1%
swd102017
12.7%
poverty94848
11.8%
ell77411
9.6%
students61409
7.7%
all61409
7.7%
female58917
7.3%
male58760
7.3%
hispanic52350
 
6.5%
black40209
 
5.0%
Other values (3)57495
7.2%

Most occurring characters

ValueCountFrequency (%)
t387987
 
9.7%
e365774
 
9.2%
l280704
 
7.0%
a234808
 
5.9%
o232246
 
5.8%
198807
 
5.0%
S163426
 
4.1%
L154822
 
3.9%
i153569
 
3.9%
n138331
 
3.5%
Other values (22)1674108
42.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2624696
65.9%
Uppercase Letter1161079
29.1%
Space Separator198807
 
5.0%

Most frequent character per category

ValueCountFrequency (%)
t387987
14.8%
e365774
13.9%
l280704
10.7%
a234808
8.9%
o232246
8.8%
i153569
 
5.9%
n138331
 
5.3%
s138331
 
5.3%
r103474
 
3.9%
v94848
 
3.6%
Other values (8)494624
18.8%
ValueCountFrequency (%)
S163426
14.1%
L154822
13.3%
N137398
11.8%
W126314
10.9%
D102017
8.8%
P94848
8.2%
A85981
7.4%
E77411
6.7%
F58917
 
5.1%
M58760
 
5.1%
Other values (3)101185
8.7%
ValueCountFrequency (%)
198807
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3785775
95.0%
Common198807
 
5.0%

Most frequent character per script

ValueCountFrequency (%)
t387987
 
10.2%
e365774
 
9.7%
l280704
 
7.4%
a234808
 
6.2%
o232246
 
6.1%
S163426
 
4.3%
L154822
 
4.1%
i153569
 
4.1%
n138331
 
3.7%
s138331
 
3.7%
Other values (21)1535777
40.6%
ValueCountFrequency (%)
198807
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3984582
100.0%

Most frequent character per block

ValueCountFrequency (%)
t387987
 
9.7%
e365774
 
9.2%
l280704
 
7.0%
a234808
 
5.9%
o232246
 
5.8%
198807
 
5.0%
S163426
 
4.1%
L154822
 
3.9%
i153569
 
3.9%
n138331
 
3.5%
Other values (22)1674108
42.0%

# Days Absent
Real number (ℝ≥0)

Distinct16793
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1495.26026
Minimum0
Maximum105055
Zeros7
Zeros (%)< 0.1%
Memory size9.2 MiB
2021-01-11T19:03:41.910264image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile97
Q1299
median647
Q31425
95-th percentile6026
Maximum105055
Range105055
Interquartile range (IQR)1126

Descriptive statistics

Standard deviation2873.307826
Coefficient of variation (CV)1.921610507
Kurtosis117.776258
Mean1495.26026
Median Absolute Deviation (MAD)430
Skewness7.939262911
Sum902263965
Variance8255897.862
MonotocityNot monotonic
2021-01-11T19:03:42.100949image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
158685
 
0.1%
126680
 
0.1%
121664
 
0.1%
115661
 
0.1%
141658
 
0.1%
185657
 
0.1%
174653
 
0.1%
227649
 
0.1%
201649
 
0.1%
156648
 
0.1%
Other values (16783)596812
98.9%
ValueCountFrequency (%)
07
< 0.1%
21
 
< 0.1%
35
< 0.1%
44
< 0.1%
55
< 0.1%
ValueCountFrequency (%)
1050551
< 0.1%
1010871
< 0.1%
869031
< 0.1%
868951
< 0.1%
865131
< 0.1%

# Days Present
Real number (ℝ≥0)

Distinct80995
Distinct (%)13.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16902.32221
Minimum8
Maximum934266
Zeros0
Zeros (%)0.0%
Memory size9.2 MiB
2021-01-11T19:03:42.372988image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum8
5-th percentile1362
Q13680
median8017
Q316040.25
95-th percentile67067
Maximum934266
Range934258
Interquartile range (IQR)12360.25

Descriptive statistics

Standard deviation30097.53948
Coefficient of variation (CV)1.780674815
Kurtosis76.22062149
Mean16902.32221
Median Absolute Deviation (MAD)5144
Skewness6.426763123
Sum1.019913166 × 1010
Variance905861882.8
MonotocityNot monotonic
2021-01-11T19:03:43.476552image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
102091
 
< 0.1%
99789
 
< 0.1%
152587
 
< 0.1%
120984
 
< 0.1%
103284
 
< 0.1%
103984
 
< 0.1%
101382
 
< 0.1%
119882
 
< 0.1%
118482
 
< 0.1%
136681
 
< 0.1%
Other values (80985)602570
99.9%
ValueCountFrequency (%)
81
< 0.1%
211
< 0.1%
951
< 0.1%
1321
< 0.1%
1591
< 0.1%
ValueCountFrequency (%)
9342661
< 0.1%
9225431
< 0.1%
9150102
< 0.1%
9047501
< 0.1%
8869671
< 0.1%

% Attendance
Real number (ℝ≥0)

HIGH CORRELATION

Distinct665
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean91.29376632
Minimum0.7
Maximum100
Zeros0
Zeros (%)0.0%
Memory size9.2 MiB
2021-01-11T19:03:43.667567image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.7
5-th percentile81.1
Q189.4
median92.6
Q394.7
95-th percentile96.9
Maximum100
Range99.3
Interquartile range (IQR)5.3

Descriptive statistics

Standard deviation5.341073645
Coefficient of variation (CV)0.05850425346
Kurtosis10.96648497
Mean91.29376632
Median Absolute Deviation (MAD)2.5
Skewness-2.352002407
Sum55088119.3
Variance28.52706768
MonotocityNot monotonic
2021-01-11T19:03:43.854841image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
94.17549
 
1.3%
94.37362
 
1.2%
93.97359
 
1.2%
94.57344
 
1.2%
947330
 
1.2%
93.87325
 
1.2%
94.67300
 
1.2%
94.27278
 
1.2%
93.67241
 
1.2%
94.47235
 
1.2%
Other values (655)530093
87.8%
ValueCountFrequency (%)
0.71
< 0.1%
1.71
< 0.1%
9.91
< 0.1%
121
< 0.1%
12.11
< 0.1%
ValueCountFrequency (%)
1008
< 0.1%
99.93
 
< 0.1%
99.89
< 0.1%
99.714
< 0.1%
99.619
< 0.1%

% Chronically Absent
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct984
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27.72095188
Minimum0
Maximum100
Zeros18887
Zeros (%)3.1%
Memory size9.2 MiB
2021-01-11T19:03:44.064285image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3.1
Q113.5
median25
Q339.4
95-th percentile61.3
Maximum100
Range100
Interquartile range (IQR)25.9

Descriptive statistics

Standard deviation18.06207498
Coefficient of variation (CV)0.6515676324
Kurtosis0.04975180387
Mean27.72095188
Median Absolute Deviation (MAD)12.5
Skewness0.6709151033
Sum16727265.9
Variance326.2385527
MonotocityNot monotonic
2021-01-11T19:03:44.300145image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
018887
 
3.1%
33.310646
 
1.8%
509534
 
1.6%
258414
 
1.4%
16.77247
 
1.2%
206648
 
1.1%
14.36222
 
1.0%
28.65454
 
0.9%
12.55211
 
0.9%
404670
 
0.8%
Other values (974)520483
86.3%
ValueCountFrequency (%)
018887
3.1%
0.21
 
< 0.1%
0.311
 
< 0.1%
0.441
 
< 0.1%
0.544
 
< 0.1%
ValueCountFrequency (%)
100336
0.1%
991
 
< 0.1%
98.91
 
< 0.1%
98.61
 
< 0.1%
98.52
 
< 0.1%

Next Year % Chronically Absent
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
ZEROS

Distinct972
Distinct (%)0.2%
Missing149460
Missing (%)24.8%
Infinite0
Infinite (%)0.0%
Mean27.37607984
Minimum0
Maximum100
Zeros11965
Zeros (%)2.0%
Memory size9.2 MiB
2021-01-11T19:03:44.516985image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3.4
Q113.3
median25
Q338.9
95-th percentile60
Maximum100
Range100
Interquartile range (IQR)25.6

Descriptive statistics

Standard deviation17.70309592
Coefficient of variation (CV)0.6466629271
Kurtosis0.03709800431
Mean27.37607984
Median Absolute Deviation (MAD)12.5
Skewness0.6705077007
Sum12427535.7
Variance313.3996053
MonotocityNot monotonic
2021-01-11T19:03:44.738579image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
011965
 
2.0%
33.37139
 
1.2%
506281
 
1.0%
255911
 
1.0%
204804
 
0.8%
16.74622
 
0.8%
14.34048
 
0.7%
28.63574
 
0.6%
12.53532
 
0.6%
403360
 
0.6%
Other values (962)398720
66.1%
(Missing)149460
 
24.8%
ValueCountFrequency (%)
011965
2.0%
0.21
 
< 0.1%
0.36
 
< 0.1%
0.439
 
< 0.1%
0.535
 
< 0.1%
ValueCountFrequency (%)
100148
< 0.1%
98.61
 
< 0.1%
98.51
 
< 0.1%
98.31
 
< 0.1%
98.11
 
< 0.1%
Distinct3
Distinct (%)< 0.1%
Missing149460
Missing (%)24.8%
Memory size9.2 MiB
Medium
302151 
High
76651 
Low
75154 

Length

Max length6
Median length6
Mean length5.165637198
Min length3

Characters and Unicode

Total characters2344972
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMedium
2nd rowHigh
3rd rowMedium
4th rowMedium
5th rowMedium
ValueCountFrequency (%)
Medium302151
50.1%
High76651
 
12.7%
Low75154
 
12.5%
(Missing)149460
24.8%
2021-01-11T19:03:45.126136image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-01-11T19:03:45.257744image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
medium302151
66.6%
high76651
 
16.9%
low75154
 
16.6%

Most occurring characters

ValueCountFrequency (%)
i378802
16.2%
M302151
12.9%
e302151
12.9%
d302151
12.9%
u302151
12.9%
m302151
12.9%
H76651
 
3.3%
g76651
 
3.3%
h76651
 
3.3%
L75154
 
3.2%
Other values (2)150308
 
6.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1891016
80.6%
Uppercase Letter453956
 
19.4%

Most frequent character per category

ValueCountFrequency (%)
i378802
20.0%
e302151
16.0%
d302151
16.0%
u302151
16.0%
m302151
16.0%
g76651
 
4.1%
h76651
 
4.1%
o75154
 
4.0%
w75154
 
4.0%
ValueCountFrequency (%)
M302151
66.6%
H76651
 
16.9%
L75154
 
16.6%

Most occurring scripts

ValueCountFrequency (%)
Latin2344972
100.0%

Most frequent character per script

ValueCountFrequency (%)
i378802
16.2%
M302151
12.9%
e302151
12.9%
d302151
12.9%
u302151
12.9%
m302151
12.9%
H76651
 
3.3%
g76651
 
3.3%
h76651
 
3.3%
L75154
 
3.2%
Other values (2)150308
 
6.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII2344972
100.0%

Most frequent character per block

ValueCountFrequency (%)
i378802
16.2%
M302151
12.9%
e302151
12.9%
d302151
12.9%
u302151
12.9%
m302151
12.9%
H76651
 
3.3%
g76651
 
3.3%
h76651
 
3.3%
L75154
 
3.2%
Other values (2)150308
 
6.4%

District_Number
Real number (ℝ≥0)

Distinct33
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.97855211
Minimum1
Maximum75
Zeros0
Zeros (%)0.0%
Memory size9.2 MiB
2021-01-11T19:03:45.391257image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q19
median17
Q326
95-th percentile32
Maximum75
Range74
Interquartile range (IQR)17

Descriptive statistics

Standard deviation14.97024431
Coefficient of variation (CV)0.7887980193
Kurtosis5.570359878
Mean18.97855211
Median Absolute Deviation (MAD)9
Skewness2.009067137
Sum11451962
Variance224.1082148
MonotocityNot monotonic
2021-01-11T19:03:45.601994image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
239354
 
6.5%
1031438
 
5.2%
3129853
 
4.9%
2726607
 
4.4%
7525365
 
4.2%
1123440
 
3.9%
922973
 
3.8%
2422276
 
3.7%
2820618
 
3.4%
3020105
 
3.3%
Other values (23)341387
56.6%
ValueCountFrequency (%)
111120
 
1.8%
239354
6.5%
316474
2.7%
412827
 
2.1%
510645
 
1.8%
ValueCountFrequency (%)
7525365
4.2%
329275
 
1.5%
3129853
4.9%
3020105
3.3%
2918368
3.0%

Borough_Code
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.2 MiB
K
178769 
Q
148530 
X
133381 
M
110824 
R
31912 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters603416
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowM
3rd rowM
4th rowM
5th rowM
ValueCountFrequency (%)
K178769
29.6%
Q148530
24.6%
X133381
22.1%
M110824
18.4%
R31912
 
5.3%
2021-01-11T19:03:46.021581image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-01-11T19:03:46.141728image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
k178769
29.6%
q148530
24.6%
x133381
22.1%
m110824
18.4%
r31912
 
5.3%

Most occurring characters

ValueCountFrequency (%)
K178769
29.6%
Q148530
24.6%
X133381
22.1%
M110824
18.4%
R31912
 
5.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter603416
100.0%

Most frequent character per category

ValueCountFrequency (%)
K178769
29.6%
Q148530
24.6%
X133381
22.1%
M110824
18.4%
R31912
 
5.3%

Most occurring scripts

ValueCountFrequency (%)
Latin603416
100.0%

Most frequent character per script

ValueCountFrequency (%)
K178769
29.6%
Q148530
24.6%
X133381
22.1%
M110824
18.4%
R31912
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII603416
100.0%

Most frequent character per block

ValueCountFrequency (%)
K178769
29.6%
Q148530
24.6%
X133381
22.1%
M110824
18.4%
R31912
 
5.3%

School_Number
Real number (ℝ≥0)

Distinct650
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean250.0509814
Minimum1
Maximum993
Zeros0
Zeros (%)0.0%
Memory size9.2 MiB
2021-01-11T19:03:46.313926image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile17
Q194
median205
Q3367
95-th percentile627
Maximum993
Range992
Interquartile range (IQR)273

Descriptive statistics

Standard deviation195.0448132
Coefficient of variation (CV)0.7800201866
Kurtosis0.3921586779
Mean250.0509814
Median Absolute Deviation (MAD)127
Skewness0.931765877
Sum150884763
Variance38042.47914
MonotocityNot monotonic
2021-01-11T19:03:46.508134image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
202599
 
0.4%
482516
 
0.4%
42447
 
0.4%
112445
 
0.4%
462396
 
0.4%
752366
 
0.4%
1382362
 
0.4%
92348
 
0.4%
412260
 
0.4%
362247
 
0.4%
Other values (640)579430
96.0%
ValueCountFrequency (%)
11853
0.3%
21476
0.2%
31875
0.3%
42447
0.4%
51924
0.3%
ValueCountFrequency (%)
993797
0.1%
971366
0.1%
964535
0.1%
933157
 
< 0.1%
90714
 
< 0.1%

Borough_Name
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.2 MiB
Brooklyn
178769 
Queens
148530 
Bronx
133381 
Manhattan
110824 
Staten Island
31912 

Length

Max length13
Median length8
Mean length7.29266211
Min length5

Characters and Unicode

Total characters4400509
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowManhattan
2nd rowManhattan
3rd rowManhattan
4th rowManhattan
5th rowManhattan
ValueCountFrequency (%)
Brooklyn178769
29.6%
Queens148530
24.6%
Bronx133381
22.1%
Manhattan110824
18.4%
Staten Island31912
 
5.3%
2021-01-11T19:03:46.878918image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-01-11T19:03:47.026264image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
brooklyn178769
28.1%
queens148530
23.4%
bronx133381
21.0%
manhattan110824
17.4%
staten31912
 
5.0%
island31912
 
5.0%

Most occurring characters

ValueCountFrequency (%)
n746152
17.0%
o490919
11.2%
a396296
9.0%
e328972
 
7.5%
B312150
 
7.1%
r312150
 
7.1%
t285472
 
6.5%
l210681
 
4.8%
s180442
 
4.1%
k178769
 
4.1%
Other values (10)958506
21.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3733269
84.8%
Uppercase Letter635328
 
14.4%
Space Separator31912
 
0.7%

Most frequent character per category

ValueCountFrequency (%)
n746152
20.0%
o490919
13.1%
a396296
10.6%
e328972
8.8%
r312150
8.4%
t285472
 
7.6%
l210681
 
5.6%
s180442
 
4.8%
k178769
 
4.8%
y178769
 
4.8%
Other values (4)424647
11.4%
ValueCountFrequency (%)
B312150
49.1%
Q148530
23.4%
M110824
 
17.4%
S31912
 
5.0%
I31912
 
5.0%
ValueCountFrequency (%)
31912
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4368597
99.3%
Common31912
 
0.7%

Most frequent character per script

ValueCountFrequency (%)
n746152
17.1%
o490919
11.2%
a396296
9.1%
e328972
 
7.5%
B312150
 
7.1%
r312150
 
7.1%
t285472
 
6.5%
l210681
 
4.8%
s180442
 
4.1%
k178769
 
4.1%
Other values (9)926594
21.2%
ValueCountFrequency (%)
31912
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4400509
100.0%

Most frequent character per block

ValueCountFrequency (%)
n746152
17.0%
o490919
11.2%
a396296
9.0%
e328972
 
7.5%
B312150
 
7.1%
r312150
 
7.1%
t285472
 
6.5%
l210681
 
4.8%
s180442
 
4.1%
k178769
 
4.1%
Other values (10)958506
21.8%

% Attendance_2_yr_avg
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct1170
Distinct (%)0.2%
Missing20151
Missing (%)3.3%
Infinite0
Infinite (%)0.0%
Mean91.27576033
Minimum1.7
Maximum100
Zeros0
Zeros (%)0.0%
Memory size9.2 MiB
2021-01-11T19:03:47.188125image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1.7
5-th percentile81.55
Q189.45
median92.5
Q394.55
95-th percentile96.75
Maximum100
Range98.3
Interquartile range (IQR)5.1

Descriptive statistics

Standard deviation5.04690642
Coefficient of variation (CV)0.05529295403
Kurtosis8.920475652
Mean91.27576033
Median Absolute Deviation (MAD)2.4
Skewness-2.154936517
Sum53237956.35
Variance25.47126442
MonotocityNot monotonic
2021-01-11T19:03:47.385770image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
93.64102
 
0.7%
93.44065
 
0.7%
93.94048
 
0.7%
944032
 
0.7%
93.13965
 
0.7%
94.43947
 
0.7%
94.63935
 
0.7%
94.53933
 
0.7%
93.53862
 
0.6%
95.13738
 
0.6%
Other values (1160)543638
90.1%
(Missing)20151
 
3.3%
ValueCountFrequency (%)
1.71
 
< 0.1%
16.44
< 0.1%
23.94
< 0.1%
25.11
 
< 0.1%
28.62
< 0.1%
ValueCountFrequency (%)
10010
< 0.1%
99.856
< 0.1%
99.86
< 0.1%
99.754
 
< 0.1%
99.72
 
< 0.1%

% Chronically Absent_2_yr_avg
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
ZEROS

Distinct2674
Distinct (%)0.5%
Missing20151
Missing (%)3.3%
Infinite0
Infinite (%)0.0%
Mean28.00867402
Minimum0
Maximum100
Zeros7352
Zeros (%)1.2%
Memory size9.2 MiB
2021-01-11T19:03:47.649060image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4.5
Q114.55
median25.6
Q339.35
95-th percentile59.2
Maximum100
Range100
Interquartile range (IQR)24.8

Descriptive statistics

Standard deviation17.0425317
Coefficient of variation (CV)0.6084733499
Kurtosis-0.07234879048
Mean28.00867402
Median Absolute Deviation (MAD)12.1
Skewness0.6118168871
Sum16336479.25
Variance290.4478869
MonotocityNot monotonic
2021-01-11T19:03:47.893740image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
07352
 
1.2%
503176
 
0.5%
252841
 
0.5%
33.32556
 
0.4%
12.52036
 
0.3%
201919
 
0.3%
14.31888
 
0.3%
16.71842
 
0.3%
28.61797
 
0.3%
401561
 
0.3%
Other values (2664)556297
92.2%
(Missing)20151
 
3.3%
ValueCountFrequency (%)
07352
1.2%
0.156
 
< 0.1%
0.26
 
< 0.1%
0.256
 
< 0.1%
0.311
 
< 0.1%
ValueCountFrequency (%)
10073
< 0.1%
98.54
 
< 0.1%
97.354
 
< 0.1%
96.83
 
< 0.1%
96.61
 
< 0.1%

% Attendance_5_yr_avg
Real number (ℝ≥0)

HIGH CORRELATION

Distinct6297
Distinct (%)1.0%
Missing2506
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean91.3265413
Minimum0.7
Maximum100
Zeros0
Zeros (%)0.0%
Memory size9.2 MiB
2021-01-11T19:03:48.180112image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.7
5-th percentile81.46
Q189.55
median92.54
Q394.6
95-th percentile96.72
Maximum100
Range99.3
Interquartile range (IQR)5.05

Descriptive statistics

Standard deviation5.021769394
Coefficient of variation (CV)0.0549869657
Kurtosis10.31706588
Mean91.3265413
Median Absolute Deviation (MAD)2.36
Skewness-2.223978674
Sum54879031.93
Variance25.21816784
MonotocityNot monotonic
2021-01-11T19:03:48.414301image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
94.61678
 
0.3%
94.11627
 
0.3%
94.51569
 
0.3%
94.41534
 
0.3%
94.91530
 
0.3%
94.71500
 
0.2%
93.91470
 
0.2%
94.31460
 
0.2%
93.51455
 
0.2%
92.51448
 
0.2%
Other values (6287)585639
97.1%
(Missing)2506
 
0.4%
ValueCountFrequency (%)
0.71
< 0.1%
1.71
< 0.1%
13.21
< 0.1%
13.81
< 0.1%
161
< 0.1%
ValueCountFrequency (%)
1004
< 0.1%
99.78
< 0.1%
99.63
 
< 0.1%
99.553
 
< 0.1%
99.52
 
< 0.1%

% Chronically Absent_5_yr_avg
Real number (ℝ≥0)

HIGH CORRELATION

Distinct13754
Distinct (%)2.3%
Missing2506
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean27.45054814
Minimum0
Maximum100
Zeros3449
Zeros (%)0.6%
Memory size9.2 MiB
2021-01-11T19:03:48.624093image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4.825
Q114.3
median25.18
Q338.35
95-th percentile57.8
Maximum100
Range100
Interquartile range (IQR)24.05

Descriptive statistics

Standard deviation16.51960436
Coefficient of variation (CV)0.6017950635
Kurtosis-0.05128493758
Mean27.45054814
Median Absolute Deviation (MAD)11.78
Skewness0.6204422603
Sum16495308.88
Variance272.8973282
MonotocityNot monotonic
2021-01-11T19:03:48.902160image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03449
 
0.6%
33.3904
 
0.1%
50902
 
0.1%
16.7891
 
0.1%
14.3873
 
0.1%
25789
 
0.1%
12.5667
 
0.1%
28.6633
 
0.1%
11.1570
 
0.1%
22.2516
 
0.1%
Other values (13744)590716
97.9%
(Missing)2506
 
0.4%
ValueCountFrequency (%)
03449
0.6%
0.226
 
< 0.1%
0.246
 
< 0.1%
0.2510
 
< 0.1%
0.266
 
< 0.1%
ValueCountFrequency (%)
100112
< 0.1%
99.52
 
< 0.1%
99.466666673
 
< 0.1%
99.256
 
< 0.1%
99.166666673
 
< 0.1%

Interactions

2021-01-11T18:59:44.038028image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T18:59:44.483395image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T18:59:44.919200image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T18:59:45.491095image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T18:59:46.158961image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T18:59:46.817579image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T18:59:51.917898image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T18:59:52.439286image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T18:59:52.908312image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T18:59:53.410892image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T18:59:54.277699image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T18:59:55.103890image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T18:59:55.835699image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T18:59:56.636417image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T18:59:57.242483image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T18:59:57.855698image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:02.573930image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:02.908900image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:03.253500image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:03.593717image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:03.933420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:05.181452image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:05.554571image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:05.848195image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:06.361754image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:06.974083image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:11.738979image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:12.063417image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:12.397016image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:12.732077image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:13.085697image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:13.430210image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:13.780495image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:14.077709image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:14.553071image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:15.134513image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:19.539936image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:19.890975image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:20.277345image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:20.662844image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:20.962584image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:21.232471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:21.557082image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:21.864070image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:22.308542image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:22.862936image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:27.255943image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:27.564955image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:28.312288image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:28.624142image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:29.083205image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:29.685017image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:30.278070image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:30.858956image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:31.419771image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:32.170323image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:36.652479image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:37.275857image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:37.857512image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:38.420568image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:39.029621image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:00:48.618674image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:01:00.660024image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:01:10.263886image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:01:24.014218image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:01:35.054876image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:01:53.225291image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:02:06.734219image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:02:20.666392image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:02:35.554839image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:02:52.354771image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:02:52.803030image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:02:54.932726image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:02:55.552324image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:02:56.307704image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:02:57.123286image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:02:57.943450image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:02.796221image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:03.164686image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:03.541575image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:03.902310image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:04.261059image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:04.649541image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:05.006999image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:05.349902image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:05.859223image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:06.491198image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:11.104377image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:11.448784image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:11.842240image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:12.212837image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:12.548344image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:12.880496image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:13.222024image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:13.517540image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:14.009685image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:14.583179image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:19.125592image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:19.529418image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:19.881491image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:20.188193image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:20.495262image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:20.792906image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:21.112358image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:21.396792image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:21.949635image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:22.597704image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:26.747822image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:27.118904image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T19:03:27.429672image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-01-11T19:03:49.176187image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-01-11T19:03:49.472235image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-01-11T19:03:49.760991image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-01-11T19:03:50.120101image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-01-11T19:03:50.442838image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-01-11T19:03:29.127240image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-01-11T19:03:31.312930image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-01-11T19:03:36.012433image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-01-11T19:03:36.930583image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

DBNSchool NameGradeYearDemographic Variable# Days Absent# Days Present% Attendance% Chronically AbsentNext Year % Chronically AbsentChronically_Absent_Next_YearDistrict_NumberBorough_CodeSchool_NumberBorough_Name% Attendance_2_yr_avg% Chronically Absent_2_yr_avg% Attendance_5_yr_avg% Chronically Absent_5_yr_avg
001M015P.S. 015 Roberto ClementeAll Grades2013-14All Students2783.032020.092.026.923.4Medium01M015Manhattan93.6521.9593.06024.320
101M015P.S. 015 Roberto ClementePK in K-12 Schools2013-14All Students560.04151.088.153.365.2High01M015Manhattan92.6022.2088.77547.675
201M015P.S. 015 Roberto Clemente0K2013-14All Students659.06414.090.729.530.0Medium01M015Manhattan92.5530.4591.72030.560
301M015P.S. 015 Roberto Clemente12013-14All Students525.06214.092.231.018.8Medium01M015Manhattan93.9516.4593.16025.640
401M015P.S. 015 Roberto Clemente22013-14All Students308.03680.092.320.021.9Medium01M015Manhattan93.7520.6093.62019.540
501M015P.S. 015 Roberto Clemente32013-14All Students239.03084.092.820.010.5Medium01M015Manhattan94.3020.7093.80018.380
601M015P.S. 015 Roberto Clemente42013-14All Students301.04390.093.617.211.1Medium01M015Manhattan93.6520.5594.20016.240
701M015P.S. 015 Roberto Clemente52013-14All Students191.04087.095.57.77.4Low01M015Manhattan95.0010.2095.16010.440
801M019P.S. 019 Asher LevyAll Grades2013-14All Students4070.046883.092.026.430.5Medium01M019Manhattan90.9534.1091.36031.940
901M019P.S. 019 Asher LevyPK in K-12 Schools2013-14All Students664.05498.089.237.832.3Medium01M019Manhattan88.5033.6089.14039.640

Last rows

DBNSchool NameGradeYearDemographic Variable# Days Absent# Days Present% Attendance% Chronically AbsentNext Year % Chronically AbsentChronically_Absent_Next_YearDistrict_NumberBorough_CodeSchool_NumberBorough_Name% Attendance_2_yr_avg% Chronically Absent_2_yr_avg% Attendance_5_yr_avg% Chronically Absent_5_yr_avg
60340675X811P.S. X811All Grades2018-19ELL7728.034631.081.862.7NaNNaN75X811Bronx81.9563.9083.54000059.660000
60340775X811P.S. X811All Grades2018-19Not ELL11025.058922.084.251.7NaNNaN75X811Bronx84.1552.1583.98000053.420000
60340875X811P.S. X81192018-19ELL598.03203.084.362.5NaNNaN75X811Bronx82.8060.4084.92500051.900000
60340975X811P.S. X81192018-19Not ELL1848.08428.082.060.9NaNNaN75X811Bronx84.9050.0085.12500050.400000
60341075X811P.S. X811102018-19ELL1222.07116.085.349.0NaNNaN75X811Bronx79.0068.4082.73333356.266667
60341175X811P.S. X811102018-19Not ELL1899.010105.084.252.8NaNNaN75X811Bronx85.9045.2085.33333348.733333
60341275X811P.S. X811112018-19ELL1071.04239.079.866.7NaNNaN75X811Bronx84.9061.9086.60000053.666667
60341375X811P.S. X811112018-19Not ELL1359.08382.086.055.2NaNNaN75X811Bronx82.9053.5083.03333351.300000
60341475X811P.S. X811122018-19ELL4837.020073.080.666.7NaNNaN75X811Bronx82.5562.6582.88000063.720000
60341575X811P.S. X811122018-19Not ELL5849.031639.084.447.4NaNNaN75X811Bronx83.1053.3583.38000055.440000